-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add support for schema-scoped table functions #18022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add support for schema-scoped table functions #18022
Conversation
This commit adds the ability to register and use table functions (UDTFs) at the schema level, in addition to the existing global registration. Changes: - Extended SchemaProvider trait with 5 new methods for table function management: udtf_names(), udtf(), register_udtf(), deregister_udtf(), and udtf_exist() - Implemented table function storage in MemorySchemaProvider using DashMap - Updated ContextProvider lookup logic to support qualified names (e.g., schema.function_name) - Modified SQL parser to handle qualified table function names Backward Compatibility: - Unqualified names continue to resolve from global registry - Qualified names look up functions in the specified schema only - All new trait methods have default implementations - Zero breaking changes to existing APIs Testing: - Added 5 unit tests for MemorySchemaProvider - Added 3 Rust integration tests verifying qualified/unqualified behavior - Added 5 SQL logic tests - All 13 catalog tests passing
|
FYI - the FunctionRegistry trait already exists |
|
oh, amazing! thanks for pointing me to it, I'll have a thorough look at it soon EDIT: The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as code correctness is concerned, it looks pretty good. I do have the question about making sure this treats these table functions similar to how we treat tables when no qualification is given during registration but then used during query.
| # Test: qualified name for global table function should fail | ||
| # (global table functions are not in schemas, they're in the global registry) | ||
| statement error DataFusion error: Error during planning: Table function 'public.generate_series' not found in schema | ||
| SELECT * FROM public.generate_series(1, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I register a table without qualification, then I believe I can still access that table via datafusion.public.my_table. Right? If so I think we would want the same behavior for these functions as we do for bare tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way we won't have a global registry anymore and ctx.register_udtf() would register the table function with the public schema.
Please let me know if that's what we want and I'll add it
datafusion/catalog/src/schema.rs
Outdated
| /// | ||
| /// If a table function of the same name was already registered, returns "Table | ||
| /// function already exists" error. | ||
| #[allow(unused_variables)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Instead of this I'd prefer to use _name and _function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
datafusion/catalog/src/schema.rs
Outdated
| /// schema and returns the previously registered [`TableFunction`], if any. | ||
| /// | ||
| /// If no `name` table function exists, returns Ok(None). | ||
| #[allow(unused_variables)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: unused_variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Which issue does this PR close?
Closes #18021.
Rationale for this change
Currently, table functions (UDTFs) can only be registered globally via
SessionContext::register_udtf(). This means all table functions share a global namespace, which can lead to naming conflicts and makes it difficult to organize functions by schema.This PR adds support for registering table functions at the schema level, allowing users to organize table functions by schema while maintaining full backward compatibility with existing global registration.
What changes are included in this PR?
Extended
SchemaProvidertrait with 5 new methods for table function management:udtf_names()- list table functions in the schemaudtf()- retrieve a specific table functionregister_udtf()- register a table function to the schemaderegister_udtf()- remove a table function from the schemaudtf_exist()- check if a table function existsImplemented table function storage in
MemorySchemaProviderusingDashMapfor thread-safe concurrent accessUpdated
ContextProviderlookup logic to support qualified table function names (e.g.,schema.function_nameorcatalog.schema.function_name)Modified SQL parser to correctly handle qualified table function names in SQL queries
Backward Compatibility:
SELECT * FROM my_func()) continue to resolve from the global registrySELECT * FROM public.my_func()) look up functions in the specified schema onlyAre these changes tested?
Yes, comprehensive tests have been added:
MemorySchemaProvidercovering registration, deregistration, retrieval, and error casesAll existing tests continue to pass (369/369).
Are there any user-facing changes?
New functionality (non-breaking):
register_udtf()on aSchemaProviderinstanceSELECT * FROM myschema.my_function(args)Backward compatibility:
ctx.register_udtf()continues to work unchangedExample usage:
Open Questions
1. SchemaProvider API Design
The current implementation adds 5 new methods directly to the
SchemaProvidertrait:udtf_names(),udtf(),register_udtf(),deregister_udtf(),udtf_exist()Concern: As we add support for other UDF types (scalar UDFs, aggregate UDFs, window UDFs) to schemas, this approach will significantly expand the
SchemaProviderinterface, potentially making it noisy and harder to maintain.Alternative approach: Introduce a
UdfContainerorFunctionRegistrytrait that manages all types of user-defined functions:Benefits:
SchemaProviderinterfaceTradeoffs:
Question: Should we refactor to use a
FunctionRegistrypattern before merging, or is the current direct-method approach acceptable?2. Information Schema Exposure
Should schema-scoped table functions be exposed via
information_schema.routinesor a newinformation_schema.table_functionstable?Arguments for:
Arguments against:
Current state: Not implemented in this PR.
Question: Is information_schema support desired for the initial implementation, or can it be added in a follow-up PR?
Feedback on these design decisions would be appreciated before finalizing this PR.